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The sudden outbreak of severe acute respiratory syndrome (SARS) in 2002 
prompted the establishment of a global scientific network subsuming most of the 
traditional rivalries in the competitive field of virology. Within months of the SARS 
outbreak, collaborative work revealed the identity of the disastrous pathogen as 
SARS-associated coronavirus (SARS-CoV). However, although the rapid identifi¬ 
cation of the agent represented an important breakthrough, our understanding of 
the deadly virus remains limited. Detailed biological knowledge is crucial for the 
development of effective countermeasures, diagnostic tests, vaccines and antiviral 
drugs against the SARS-CoV. This article reviews the present state of molecular 
knowledge about SARS-CoV, from the aspects of comparative genomics, molecular 
biology of viral genes, evolution, and epidemiology, and describes the diagnostic 
tests and the anti-viral drugs derived so far based on the available molecular infor¬ 
mation. 
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Introduction 


The first SARS case was reported in late 2002 in 
China’s Guangdong Province (1). The disease was 
contagious and spreaded rapidly, resulting in a SARS 
outbreak in Hong Kong in mid-February 2003, and 
other outbreaks elsewhere in the world. At the end 
of March 2003, a virus of the Coronaviridae family 
was identified as the causative agent of the disease 
(2~4)- This identification has been confirmed by the 
World Health Organization, and the virus concerned 
has been designated as the SARS-associated coro¬ 
navirus (SARS-CoV). During the SARS outbreaks 
in 2002 and 2003, SARS cases were identified in 
19 countries, and in total 8,605 individuals became 
infected, of whom 774 died (http://www.who.int/ 
csr/sars / country/table2003J19_23/en/). 

In addition to its cost in human lives, the SARS 
outbreak also had a great impact on the health care 
system and economy of Hong Kong and other infected 
regions. In Hong Kong, the estimated economic loss 
was about HK$46 billion (US$5.9 billion; ref. 5). The 
possibility that SARS-CoV transmission can occur be¬ 
tween human beings without reinforcement from the 


animal reservoir (5) and the capability of the 
virus to infect multiple cell types (6) and an¬ 
imals (7) further increased the epidemiological 
burden of the SARS pandemic. Although the 
spread of the virus had seemed to be confined by 
July 2003 through rigorous quarantine measures 
(http: / / www.who.int / csr / sars / country / table2003J39 
_23/en/), it may still be circulating in the animal 
reservoir and it is impossible to say that it will not 
return (8-10). Because of this possibility, better mon¬ 
itoring of SARS outbreaks through accurate diagnos¬ 
tic tests and the development of effective anti-viral 
therapies are urgently required. These in turn depend 
on better molecular knowledge about the SARS-CoV. 
Such research is therefore of vital importance if the 
community is to be properly prepared for a possible 
recurrence of the SARS pandemic. 

Molecular Biology of SARS-CoV 

Molecular characterization of the 
SARS-CoV genome 
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ing serological techniques, virus isolation by cell cul¬ 
ture, and electron microscopy (2,10). Both molecu¬ 
lar approaches and conventional approaches were em¬ 
ployed for the initial characterization of the SARS 
pathogen (2). Peiris et al (2) firstly isolated the virus 
from in vitro tissue culture and subsequently yielded a 
646-bp genomic fragment by RT-PCR using degener¬ 
ate primers, which showed more than 50% homology 
to the RNA polymerase gene of bovine coronavirus 
(BCV) and murine hepatitis virus (MHV). The use 
of gene chip further confirmed the coronavirus as a 
possible cause of SARS ( 11 ). 

Soon after the identification of the SARS-CoV, 
laboratories started to investigate the phylogenetic re¬ 
lationship between the virus and the other members of 
the same family through extensive comparison of their 
genome sequences. In mid-April 2003, the British 
Columbia Cancer Agency (BCCA) Genome Science 
Center in Canada ( 12 ), the Center of Disease Control 
in the United States (13) and the University of Hong 
Kong (14) announced at nearly the same time that 
the complete genome sequence of the SARS-CoV had 


been isolated in the corresponding areas (15). The 
results of independent sequencing of the SARS-CoV 
genome all indicated that it was a polyadenylated ge¬ 
nomic RNA of 29.7 Kb in length. Comparative anal¬ 
ysis of the genome with other coronaviruses suggested 
that the virus genome was very similar to previously 
characterized coronaviruses, with the order (starting 
from the N-terminal): replicase (R), spike (S), enve¬ 
lope (E), membrane (M) and nucleocapsicl (N) gene, 
where there are few accessory genes or motifs span¬ 
ning between the structural genes and at the 3' UTR 
(untranslated region), which may not be necessary for 
viral replication (12). The replicase gene, with two 
open reading frames (ORF) la and lb, covering more 
than two thirds of the genome, is predicted to encode 
only two proteinases ( 12 - 14 ) that regulate both the 
replication of the positive-stranded genomic RNA and 
the subsequent transcription of a nested set of eight 
subgenomic (sg) mRNAs (Table 1; ref. 16), which is 
a common transcription strategy adopted by coron¬ 
avirus members (17-21). 


Table 1 Features of SARS-CoV Genome Sequence and Subgenomic Transcripts 


g/sg mRNA 

Thiel et al. 

ORF 

Zeng et al. Marra et al. 

Rota et al. 

Start-End 

No. of a.a. 

No. of Bases 

Frame 

mRNA 1 

ORF la 

ORF la 

ORF la 

ORF la 

265-13,398 

4,382 

13,149 

+ 1 

mRNA 1 

ORF lb 

ORF lb 

ORF lb 

ORF lb 

13,398-21,485 

2,628 

7,887 

+3 

mRNA 2 

S protein 

S protein 

S protein 

S protein 

21,492-25,259 

1,255 

3,768 

+3 

mRNA 3 

ORF 3a 

XI 

ORF 3 

XI 

25,268-26,092 

274 

825 

+2 

mRNA 3 

ORF 3b 

N/R 

ORF 4 

X2 

25,689-26,153 

154 

465 

+3 

mRNA 4 

E protein 

N/R 

E protein 

E protein 

26,117-26,347 

76 

231 

+2 

mRNA 5 

M protein 

M protein 

M protein 

M protein 

26,398-27,063 

221 

666 

+ 1 

mRNA 6 

ORF 6 

N/R 

ORF 7 

X3 

27,074-27,265 

63 

192 

+2 

mRNA 7 

ORF 7a 

X2 

ORF 8 

X4 

27,273-27,641 

122 

369 

+3 

mRNA 7 

ORF 7b 

N/R 

ORF 9 

N/R 

27,638-27,772 

44 

135 

+2 

mRNA 8 

ORF 8a 

X3 

ORF 10 

N/R 

27,779-27,898 

39 

120 

+2 

mRNA 8 

ORF 8b 

N/R 

ORF 11 

X5 

27,864-28,118 

84 

255 

+3 

mRNA 9 

N protein 

N protein 

N protein 

N protein 

28,120-29,388 

422 

1,269 

+ 1 

mRNA 9 

ORF 9b 

N/R 

ORF 13 

N/R 

28,130-28,426 

98 

297 

+2 


SARS-CoV protein products 
5' and 3' UTR 

The 5' UTR of the SARS-CoV genome was charac¬ 
terized by 5' Rapid Amplification of cDNA Ends (5' 
RACE; ref. 14) and Northern blot assay (13, 16, 
22). These procedures elucidated the leader sequence 
and the transcription regulatory sequence (TRS). The 
leader sequence found in the viral sg mRNA tran¬ 


scripts is at least 72 nucleotides long. Through the 
alignment of the leader sequence at the 5' end of the 
eight sg mRNAs, there is a minimal consensus TRS, 
namely, S'-ACGAAC-S', which participates in the dis¬ 
continuous synthesis of sg mRNAs as a signaling se¬ 
quence. The degree of sequence variance flanking the 
TRS showed no clear relationship with the abundance 
of the sg mRNAs (22). A highly conserved s2m motif 
with 32 nucleotides was also identified in the 3' region 
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of the genome, which had also been described in avian 
infectious bronchitis virus (AIBV; ref. 12-14). 

Replicase Gene 

The replicase gene of the SARS-CoV encodes for 
at least two proteins as a consequence of the pro¬ 
teolytic processing of the large polyprotein (ORF 
la and lb; ref. 16). The translation of seg¬ 
ment lb of such polyprotein is interrupted by the 
— 1 ribosomal frame shifting by a putative “slip¬ 
pery” sequence and a putative pseudoknot struc¬ 
ture (16). Two functional domains—papain-like cys¬ 
teine proteinase (PL2 pro ) and 3C-like cysteine pro¬ 
teinase (3CL pro ), were identified experimentally and 
were responsible for the proteolytic processing of the 
polyprotein into 16 subunits (16,22,23). A 375-a.a. 
SARS-CoV unique domain was identified upstream 
of the PL2 pro domain, which is unparalleled in any 
other known coronaviruses (16). In addition, seven 
more putative regions encoding RNA processing en¬ 
zymes were identified, namely, RNA-dependent RNA 
polymerase (RDRP), RNA helicase (HEL) poly (U)- 
specific endoribonuclease (XendoU), 30-to-50 exonu¬ 
clease (ExoN), S-adenosylmethionine-dependent ri- 
bose 20-O-methyltransferase (20-O-MT), adenosine 
diphosphate-ribose 100-phosphatase (ADRP), and a 
cyclic phosphodiesterase (CPD; ref. 16). 

The translation of two polyproteins from ORF la 
and lb starts the genome expression. The two pro- 
teinases, PCL2 pro and 3CL PRO , are then coupled 
with the proteolytic processing of the two polypro¬ 
teins into 16 units. PCL2 pro is responsible for the 
N-proximal cleavage and 3CL PRO is responsible for 
the C-proximal cleavage. The helicase is then re¬ 
leased. ATPase activity and DNA duplex-unwinding 
activity were demonstrated by purified helicase, indi¬ 
cating that the protein has RNA polymerase activity 
(16,24). 

S Gene 

Together with the M protein, the spike protein is be¬ 
lieved to be incorporated into the viral envelope be¬ 
fore the mature virion is released (17). Initial anal¬ 
ysis of the 1255-a.a. peplomer protein of the virus 
reveals the possible existence of a signal peptide that 
would likely be cleaved between residues 13 and 14 
(12). The whole structure is predicted to contain a 
receptor-binding unit (SI) in the N-terminus (14,25- 
27) and a transmembrane unit (S2) in the C-terminus 
(13,14,25,27). Molecular modeling of the SI and 


S2 subunits of the spike glycoprotein (26,28) sug¬ 
gested that the former unit is consisted of mainly 
anti-parallel /3-sheets with dispersed a and /3 regions, 
in addition to the three domains identified in the S2 
unit. The confidence level of the predicted molecular 
models was strengthened by the good correlation be¬ 
tween predicted accessibility and hydropathy profiles 
and by the correct locations of the N/ O-glycosylation 
sites and most of the disulfide bridges. Whether the 
experimentally determined N-glycosylated sites from 
purified spike protein treated by tryptic digest to¬ 
gether with PNGase followed by time-of-flight (TOF) 
mass spectrometry (29) are correctly located in the 
proposed model remains to be clarified. In the as¬ 
pect of biological activities, receptors for the binding 
of the SARS-CoV remain mysterious, as comparative 
genomics did not point out any significant similarity 
with the SI domain of other human coronaviruses, 
implying that these viruses are using different recep¬ 
tors for cell entry (12). Subsequently, angiotensin¬ 
converting enzyme 2 (ACE2) was demonstrated to 
be a functional receptor for the SARS-CoV in vitro. 
Synctia was observed in cell culture expressing ACE2 
and the SARS-CoV SI domain, which could be in¬ 
hibited by anti-ACE2 antibody (30). Fine mapping 
on the N-terminal unit of the spike protein indicates 
that the receptor-binding domain is probably located 
between the residues 303 and 537 (31). 

ORF 3a 

The sequence of the gene product from ORF 3a shows 
no homology to any known proteins (12,14). Sig¬ 
nal peptide or a cleaved site is likely to be present 
in the protein except three predicted transmembrane 
domains (12). The exact function of the protein is 
yet to be determined, though the C-terminal of the 
protein may be involved in ATP-binding properties 
( 12 ). 

E Gene 

The envelope protein of the SARS-CoV is thought to 
be the component of the virus envelope. Topology 
prediction suggested that the E protein is a type II 
membrane protein with the C-terminus hydrophilic 
domain exposed on the virion surface. Comparative 
protein sequence analysis suggested the SARS-CoV E 
protein resembles the protein connected with MHV 
(12,32,33). 


Geno., Prot. & Bioinfo. Vol. 1 No. 4 November 2003 


249 



Molecular Advances in SARS-CoV 


M Gene 

The matrix glycoprotein is not likely to be cleaved 
(12) and contains three putative transmembrane do¬ 
mains (12-14)- Its hydrophilic domain is believed to 
interact with the nucleocapsid protein and is located 
inside the virus particle (12). Linear epitope mapping 
of the M protein using synthetic peptides revealed 
that amino acid residues 2,137-2,158 interacted with 
SARS patient sera by ELISA assay, implying the po¬ 
tential capability of the M protein to induce immune 
response (34,35). 

ORF la and 8a 

Like ORF 3a, sequence homology search yielded no 
significant result for any existing proteins, but the ex¬ 
istence of a cleavage site (between residues 15 and 16) 
and a transmembrane helix were predicted. For ORF 
7a, it is a putative type I membrane protein (12). 

N gene 

The N gene sequence showed high homology with the 
nucleocapsid protein of other coronaviruses. A puta¬ 
tive short lysine-rich nuclear localization signal (KTF- 
PPTEPKKDKKKKTDEAQ) was identified (12). A 
potential and well-conserved RNA interaction domain 
was also identified at the middle region of the gene, 
in which its basic nature may assist its role (12,14). 
The N protein was reported to activate the AP-1 sig¬ 
nal transduction pathway, indicating that the protein 
may play a role in the regulation of the host cell cycle 
(36). Apart from the possible role in pathogenicity, N 
gene was also believed to be the most abundant anti¬ 
gen in the host during the course of infection, mak¬ 
ing it an excellent candidate for diagnostic purposes. 
The linear epitopes of the protein have been mapped 
(35,37,38), and the possibility of using these anti¬ 
genic peptides or recombinant proteins in the diagno¬ 
sis was discussed. 

Phylogenetic analysis of the SARS-CoV 

Protein sequence based on individual ORFs 

The phylogenetic relationship by the comparison of 
the deduced amino acid sequences of the replicase 
gene and four structural genes (S, E, M, N) with other 
coronaviruses was described (12-14). The conclusions 
drawn by the different research groups were similar, 
with the observation that SARS-CoV itself forms a 
distinct cluster—the fourth group of Coronaviridae, a 


notion supported by the high bootstrap values (above 
90%). As a result, it has been concluded that the 
SARS-CoV is phylogenetically equidistant from all 
other known coronaviruses. Moreover, no detectable 
recombination event was concluded in the similarity 
plot on the whole genome alignment with other coro¬ 
naviruses (14). The above findings suggest that the 
SARS-CoV is neither a mutant nor a recombinant of 
existing coronaviruses, and that the possibility of such 
a virus emerging as a product of genetic engineering 
can be excluded, as it is unlikely to generate an in¬ 
fectious coronavirus with 50% of its genome different 
from the existing coronaviruses (9). 

Protein sequence based on functional domain 
of the replicase gene 

Snijder et al (22) conducted an extensive phyloge¬ 
netic analysis concerning the replicase gene of the 
SARS-CoV by using torovirus as an outgroup. These 
authors criticized the phylogram construction based 
on different SARS-CoV proteins as unconvincing, and 
suggested the possibility that the SARS-CoV can be 
clustered into an existing group. As the structural 
and other accessory genes can either be gained or lost 
throughout the evolutionary process and in view of 
their low level of conservation, the author decided 
to target the replicase gene to perform the phyloge¬ 
netic analysis. For this reason, the phylogenetic re¬ 
lationship was reconstructed through a rooted tree. 
The construction of the phylogram was done with the 
fused replicase gene with manual adjustment and ex¬ 
clusion of poorly conserved region. The resulting tree 
reveals that the gene was mostly related to group 
2 coronaviruses and was assigned as a subgroup 2b. 
The author further pointed out that the SARS-CoV 
contains homologues of domains that are unique for 
group 2 coronaviruses, in the region of nspl and nsp3 
(PL2 pro ), in addition to the differences in the se¬ 
quence and arrangements of the 3'-located ORFs, and 
the lack of antigenic cross-reactivity do not contradict 
their conclusion, as such a phenomenon was also ob¬ 
served in group 1 coronaviruses. 

Using Bayesian phylogenic inference approach, a 
recombination break point within the SARS-CoV 
R.DRP was identified at protein sequence level (39). 
Phylogenetic analysis on the 5' end of the domain 
indicated that it might originate from the common 
ancestor of all existing coronaviruses, while the same 
analysis on the 3' end gave another tree topology that 
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suggests a sister relationship with group 3 avian coro- 
naviruses. These results suggested that a recombina¬ 
tion event occurred between the common ancestor of 
the SARS-CoV and that of other coronaviruses, or al¬ 
ternatively that the 5' fragment of the SARS-CoV di¬ 
verged before the one between or within other known 
coronaviruses and the 3' fragment diverged more re¬ 
cently (39). 

Genome organization 

Based on the antigenic cross reactivity and genome 
characteristics, existing coronaviruses are generally 
classified into three subgroups (40). All coronaviruses 
share a very similar organization in their functional 
and structural genes, but the arrangement of the 
so-called non-essential genes is remarkably different 


among the subgroups. Group 1 coronaviruses are 
mainly characterized by the presence of ORFs follow¬ 
ing the N gene. Group 2 coronaviruses have two addi¬ 
tional ORFs, non-structural protein 2 (ns2) and HE 
gene, located between ORF lb and the S gene. Only 
group 3 species have ORFs located between the M 
and N gene, and a conserved stem-loop motif s2m at 
their 3' UTR (Figure 1). Accessory ORFs are found 
between the S and E genes in all of the subgroups. 
However, these accessory ORFs within the S-E inter- 
genic region do not seem to be homologous between 
the subgroups, though they are conserved within sub¬ 
groups. The rate of evolution of these accessory genes 
is obviously higher than that of the essential genes, 
which provides an alternative to access the phylogeny 
of the coronavirus family. 


5’ Cap 

FECV •-'*-<*’-X- 

FIPV *-i-ib-X- 

CCV -X- 

TGEV *ia-iw-X- 

PRCV -X- 

PEDV »i—i n X - 

HCV 229E »i-i k X - 

m2 _ HE 

MHV H? 

RCV 

BCV 

PHEV ? - 

HCV OC43 ••'“ik. 



O 

o 

c 

■a 


O 

o 

c 

T3 

bO 


TCV 

IBV 

SARS CoV 


3a 3b 5a 5b s2m 



o 

*n 

O 

c 

TJ 

co 

I? 


Fig. 1 Comparison of accessory genes among all known coronaviruses. The open boxes represent essential ORFs (not 
drawn to scale) while the shaded boxes represent accessory ORFs/motifs. Homologous ORFs are shaded with the same 
pattern. The names of the group-specific accessory ORFs were unified and denoted on the top of the corresponding 
subgroup ORFs. The X (black cross) represents the absence of ORFs within the region. Genome organization and 
accessory ORFs of these CoVs were confirmed except for the n2s of PHEV. All the accessory genes are group-specific 
and highly diverged within subgroups, particular within the S—E intergenic region. SARS-CoV has a very similar 
genome structure with group 3 CoVs, with two ORFs located between M and N gene, and a conserved stem-loop motif 
s2m at their 3' UTR. Although the ORF 5a/5b of group 3 CoVs and ORF 5/6 of SARS-CoV are in homologous location, 
they do not have any significant sequence homology. FECV: feline enteric coronavirus (41-45)', FIPV: feline infectious 
peritonitis virus (41-45)', CCV: canine coronavirus (43,46)', TGEV: transmissible gastroenteritis virus (41,45,48)', 
PRCV: porcine respiratory coronavirus (41,45,48)', PEDV: porcine epidemic diarrhea virus (49,50)', HCV 229E: 
human coronavirus 229E (49,51)', MHV: murine hepatitis virus (52,53)', RCV: rat coronavirus (54); BCV: bovine 
coronavirus (55); PHEV: porcine hemagglutinating encephalomyelitis virus (56); HCV OC43: human coronavirus 
OC43 (57, 58); TCV: turkey coronavirus (59-61); IBV: infectious bronchitis virus (62-64). 
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Based on the confirmed ORFs of the SARS-CoV 
described above, a comparison of all homologous ac¬ 
cessory and essential ORFs of known coronaviruses 
with the novel SARS-CoV is shown in Figure 1. From 
the results, it does not seem that the coding regions 
are a consequence of a newly occurring recombina¬ 
tion event between any of the existing known coro¬ 
naviruses, similar to the conclusion made by Holmes 
(9). Interestingly, the SARS-CoV genome has a very 
similar organization to that of group 3 avian coron- 
aviruses (IBV and TCV), with the presence of three 
ORFs within the M-N intergenic region, two ORFs 
spanning between the S and E genes (65), and a 
stem-loop motif s2m in 3' UTR. The presence of s2m 
and the finding that the 3' fragment of SARS-CoV 
RDRP clustered into group 3 in the phylogenetic anal¬ 
ysis (39) suggest that the avian coronaviruses and 
the SARS-CoV might share a common ancestor which 
gained the s2m from a single RNA horizontal trans¬ 
fer event from a non-related virus family, as the as- 
troviruses did (39,66). Another possibility, that a 
common coronavirus ancestor had once gained the 
motif but subsequently lost it, except the group 3 
and SARS-CoV, cannot of course be excluded. Pair¬ 
wise sequence homology search among the accessory 
ORFs at the S-E intergenic region of the SARS-CoV 
and all other coronaviruses shows no significant se¬ 
quence homology (12-14) but they are homologous 
within subgroups. The ORF 5a/5b of group 3 coro¬ 
naviruses and ORFs 6-8 of the SARS-CoV are in a 
homologous location, but they do not have any sig¬ 
nificant sequence homology. The above results im¬ 
ply that, although the SARS-CoV and group 3 coron¬ 
aviruses have a very similar genome organization, they 
might have acquired these accessory genes from sev¬ 
eral RNA recombination events with different hosts 
or viral sources. It is observed that the accessory 
ORFs are group-specific but are usually truncated to 
a different extent within a subgroup (Figure 1). An¬ 
other interesting observation is the genetic diversity 
at the S-E intergenic region. Usually two or three 
group-specific ORFs are found within this region of 
each subgroup, but only one confirmed ORF (ORF 
3) is found in this region of the SARS-CoV genome 
(12-14,16, 22). The diversity (mainly due to trunca¬ 
tion and deletion) of these S-E intergenic ORFs within 
the subgroups is higher than that of other accessory 
ORFs. Their sequence divergence implies their com¬ 
mon ancestors might have acquired these ORFs by 
RNA recombination, which is a common phenomenon 
in large RNA viruses (67,68), rather than evolved 


from mutations of a single ancestral RNA sequence 
segment (9). Typical examples are the acquirement 
of the HE gene from Influenza C (69) and recombi¬ 
nation events with Berne virus at the HE-ns2 region 
(52). 

Based on the recombination and truncation events 
occurring within these intergenic regions, the phyloge¬ 
netic relationship between the SARS-CoV and other 
group 3 coronaviruses has been reconstructed (Fig¬ 
ure 2). At least four subgroup common ancestors (4) 
in Figure 2) have acquired their S-E intergenic ORFs 
and other group-specific ORFs from several indepen¬ 
dent RNA recombination events. Moreover, there is a 
tendency of deletions or truncations of these ORFs 
when crossing the species barriers within the sub¬ 
groups, e.g. ORF 4a/b in group 2 (54-58)-, ORF 3a/b 
and ORF 7a/b in group 1 (41,42,47,48,50, 70-72). 
The deletions of these redundant accessory ORFs are 
likely to be the result rather than the cause of crossing 
the host barriers, as coronavirus host range specificity 
and tropism have been demonstrated, at least in four 
studies (7, 73-75), as determined by the receptor¬ 
binding domain of the spike glycoprotein. 

Recombination within certain types of viruses is a 
common phenomenon in various virus families (67), 
particularly for large RNA viruses, as a means of shed¬ 
ding the deleterious effects of the errors accumulated 
during its genome replication (68). Recombination 
events within the coronavirus family (70,76,77) or 
with other non-related virus families (52, 66, 69) have 
been reported. Apparently, the diversity of the redun¬ 
dant accessory genes has been accompanied by exten¬ 
sive genome rearrangement by heterogeneous or ho¬ 
mogenous RNA recombination events, providing use¬ 
ful information for the taxonomy of the coronaviruses. 
From this point of view, the SARS-CoV is definitely 
a new and unique member of the coronavirus fam¬ 
ily. The divergence of these redundant ORFs between 
the SARS-CoV and other known coronaviruses sug¬ 
gests that the SARS-CoV might have been circulating 
in other animal hosts long before its emergence, and 
somehow crossed into a human host several months 
ago either by a sudden bottleneck mutation event or 
a RNA recombination event with unknown sources. 

Animal reservoir 

It has been demonstrated that the SARS-CoV pos¬ 
sesses the ability to infect macaques, which display 
symptoms similar to the clinical signs of SARS pa¬ 
tients (7$), and to replicate in cats and ferrets (79). 
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Fig. 2 Phylogenetic relationship of all known coronaviruses based on the putative RNA recombination events occurred 
at the accessory ORFs. There are at least four subgroup common ancestors (0 no.1-4) have acquired their redundant 
accessory ORFs from several independent RNA recombination events. Group 3 CoVs and SARS-CoV may have a 
common ancestor (<0 no.O) which gained s2m from a single RNA horizontal transfer event from a non-related family 
of astroviruses (see text). There is a tendency of deletions or truncations of these accessory ORFs when crossing the 
species barriers within the subgroups. The abbreviations of the viral species are shown in the legend of Figure 1. 


Together with the evidence implied by the phyloge¬ 
netic studies, it is tempting to identify the possible 
animal reservoir of the coronavirus. Recent studies of 
domestic and wild animals in Guangdong, where the 
SARS epidemic was first reported, identified the exis¬ 
tence of the SARS-CoV from several animals found in 
the livestock market, including Himalayan palm civets 
(Paguma larvata ) and raccoon dogs ( Nyctereutes pro- 
cyonoides ; ref. 80), in spite of the failure of another 
group to identify any SARS-CoV after the screening 
of more than 60 animal species (81). The genome 
sequences of the coronaviruses isolated from these 
animals are almost identical (99.8%) to that of the 
SARS-CoV, revealing the extremely close phyloge¬ 
netic relationship between them. Another major find¬ 
ing from the sequence analysis highlighted a 29-bp 
deletion upstream the N gene, which was noted only 
in one Guangdong isolate available from the Gen- 
Bank (GD01, accession number 278489). Such dele¬ 
tion leads to the fusion of the two ORFs identified in 
mRNA 8 into one ORF. Yet its biological significance 
remains to be elucidated (<§). Comparison of the S 
gene nucleotide sequence of the animal and human 
SARS-CoV indicated 11 consistent nucleotide signa¬ 
ture mutations that appeared to distinguish them. 


The phylogenetic analysis of the S gene sequence be¬ 
tween human and animal SARS-CoV likely ruled out 
the possibility that it is a consequence of human to an¬ 
imal transmission, implying the infected animals may 
acquired the virus from a true animal source that has 
yet to be identified (80). This was also supported by 
the host-association analysis of coronaviruses based 
on the nucleocapsid gene (39), which pinpointed that 
host-shifts had played an important role in the evo¬ 
lution of the virus and the host. The occurrence 
of avian-mammal host-shift supports the hypothesis 
that the SARS-CoV emerged from an unknown ani¬ 
mal coronavirus. 

Reverse genetics system 

The reverse genetics system, a very useful tool in 
studying function of viral proteins and its mutations, 
was firstly described by Master’s group (82) for MHV 
in Coronaviridae. In less than six months since the 
first identification of the SARS-CoV (2), Yount et al 
(83) developed the reverse genetic systems for this 
coronavirus using the full-length cDNA clone of Ur- 
bani strain, by combining six component clones span¬ 
ning through the entire genome. Following in vitro 
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transcription and the transfection of the resulting 
RNA transcripts, a rescued recombinant virus was 
found to be capable of replication in the same way 
as the wild type. Expected marker mutations intro¬ 
duced were also identified. The success of the exper¬ 
iment offers hope for the development of attenuated 
strains of live vaccine against the SARS-CoV (9). 


found in severe SARS cases, which may explain the 
severity of SARS in these patients. Such genotype, 
as stated in the report, is common in Southern Han 
Chinese, Singaporeans and Vietnamese, but not in 
indigenous Taiwanese. There was no reported SARS 
case within the latter ethnic group. Such findings may 
explain the unusual SARS epidemic in South Asia. 


SARS and human leukocyte antigen 
(HLA) system 

There is considerable scientific interest in the identi¬ 
fication of the genetic agents responsible for the un¬ 
usual susceptibility of the SARS-CoV in some eth¬ 
nic groups. A molecular survey of the HLA sys¬ 
tem, a common method adopted to identify au¬ 
toimmune disorders and emerging infectious diseases, 
was conducted in Taiwan during the SARS epi¬ 
demic (84)- Using PCR amplification plus sequence- 
specific oligonucleotide probing (PCR-SSOP), re¬ 
searchers identified the HLA genotype of SARS pa¬ 
tients. Healthy, unrelated Taiwanese were used as 
controls, and the HLA genotype of SARS patients was 
compared with probable cases and with high-risk, un¬ 
infected health care workers. The results indicated 
that a higher frequency of HLA-B*4601 allele was 


Diagnosis of the SARS-CoV 

Work on developing a laboratory diagnosis of the 
SARS-CoV began immediately after the SARS out¬ 
break, although an ideal diagnostic system is still be¬ 
ing sought. Numerous protocols have been developed 
for the diagnosis of infectious viral diseases. Most of 
these protocols are PCR-based, and the remainder de¬ 
pends on measurable immune response. Several fac¬ 
tors affect the choice of proper diagnosis techniques, 
including time, the availability of equipment and ex¬ 
pertise, the biological nature of the available samples, 
and the requirement of data output format (Table 2; 
ref. 10). The presence of the virus can be detected 
by molecular testing such as PCR and virus isola¬ 
tion. Measurable immune responses basically rely on 
SARS-CoV specific antibodies by enzyme-linked im¬ 
munosorbent assay (ELISA). 


Table 2 Summary of Properties of Different Diagnostic Methods 


Features/Methods 

RT-PCR 

Virus isolation 

ELISA 

IFA 

Microarray 

Specificity 

High 

High 

Relatively lower 

Relatively lower 

Relatively lower 

Sensitivity 

Not very high 

Low 

High 

High 

Not very high 

Valid duration of +ve result* 

dl-dlO 

dl-dlO 

d21—d31 

dl—d31 

dl-dlO 

Valid duration of —ve result* 

N/A 

N/A 

d21—d31 

d21—d31 

N/A 

Convenience* 

Not very high 

Moderate 

High 

Not very high 

Low 

Speed 

Relatively lower 

Slow 

High 

High 

High 


* Result is defined to be valid after the onset of fever where d=day. * Convenience means the requirement of expensive 
equipment and skilled labor. 


Molecular assays 

Advances have been made in molecular diagnos¬ 
tic techniques in recent years, and such rapid and 
sensitive methods allow efficient monitoring of in¬ 
fectious viral diseases. For SARS, the first ge¬ 
netic fragment of the virus was generated by re¬ 
verse transcriptase-polymerase chain reaction (RT- 
PCR; ref. 2). Two RT-PCR protocols were then 
developed by two WHO SARS network laboratories 
(http://www.who.int/csr/sars/primers/en). The sen¬ 


sitivities of the assay were demonstrated to be at 
least 50%, with the highest percentage found in throat 
swab specimens (85). No false positive was found in 
these assays. 

The first rapid real-time assay was developed 
based on the most conserved region of the ORFlb 
gene sequence (86, 87). A person will be confirmed to 
be infected by the SARS-CoV if viral RNA is detected 
by either the two PCR assays, two aliquots of speci¬ 
men, or two sets of primers (http://www.cdc.gov/nci 
dod/sars/specimen_collection_sars2.htm). The sec- 
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ond generation of this test protocol can detect the 
existence of the virus within 10 days after the onset of 
fever (87-89) and provides 80% sensitivity and 100% 
specificity in the testing of 50 NPA samples collected 
from SARS patients within three days after the onset 
of the disease (87). To further increase the sensitivity, 
one-step real-time RT-PCR has been recently devel¬ 
oped (89). Specificity of the PCR can be enhanced by 
coupling it with the use of an additional amplification 
target using the virus N gene fragment (89), which is 
theoretically the most abundant subgenomic mRNA 
produced during transcription (13). The technique 
provides information on viral load during anti-viral 
treatments in real time, so that the efficacy of the 
therapy can be evaluated (10). However, although 
the PCR assays are powerful, their performance is 
also technically demanding and labor intensive (10). 

The development of microarray technology for vi¬ 
ral discovery was firstly described by Wang et al in 
2002 (90). The capability of the rapid high through¬ 
out screening of unknown viral pathogen gives it great 
potential to be used as a diagnostic tool. In the identi¬ 
fication of the SARS-CoV, Wang et al (11) employed 
the use of an improved microarray platform, which 
comprised conserved 70mers from each of the 1,000 
viruses, to characterize the coronavirus genome. Four 
hybridizing oligonucleotides from Astroviridae which 
share the s2m motif and three from Coronaviridae 
sharing conserved ORFlab fragment were firstly rec¬ 
ognized in the experiment. The sequence recovered 
from the surface of the microarray further confirmed 
that it is a member of the coronavirus family. The 
identity of the SARS-CoV was confirmed within 24 
hours, and this feat was followed by the partial se¬ 
quencing of the novel virus a few days later. Such 
technique demonstrated a rapid and accurate means 
of unknown virus characterization through genetic 
data. 

Virus isolation 

Virus isolation by cell culture is used extensively as 
a traditional technique in virology. Coronavirus pre¬ 
senting in the clinical specimens of SARS patients was 
detected by inoculating the clinical specimens in cell 
cultures to allow the infection and the subsequent iso¬ 
lation of the virus. Fetal rhesus kidney (FRhK-4; ref. 
2) and vero cells (3) were found to be susceptible to 
SARS-CoV infection. After the isolation procedure, 
the pathogen was identified as the SARS-CoV by fur¬ 
ther tests, such as electron microscopy, RT-PCR, or 


immunofluorescent viral antigen detection. Virus iso¬ 
lation is the only means to detect the existence of live 
virus from the tissue. The methodology is generally 
employed only for a preliminary identification of an 
unknown pathogen, as the procedure requires skillful 
technicians and is time consuming. The requirement 
of infectious viruses and that the duration of live virus 
existence varies add on further problems for conduct¬ 
ing such assays, but they are nevertheless of very high 
specificity. 

Enzyme-linked immunosorbent assay 
(ELISA) 

The N protein is usually chosen as the antigen for anti- 
coronavirus antibody detection assay (91,92) as it is 
believed to be a predominant antigen of the SARS- 
CoV (35,36). It is also the only viral protein recog¬ 
nized by acute and early convalescent sera from pa¬ 
tients recovering from SARS (29). In addition to the 
N protein, the S protein in the SARS-CoV was also 
reported as an antigen eliciting antibodies in human 
body (29), but at a much lower titer than that of the 
N protein (35,36). 

The assay based on the presence of SARS-CoV 
antibodies is suggested to be valid only for speci¬ 
mens obtained more than three weeks after the on¬ 
set of fever (88,89), although some patients have 
detectable SARS-CoV antibodies within 14 days of 
the onset of illness. Nevertheless, the negative re¬ 
sult, i.e. absence of SARS-CoV antibodies, within 
the first three weeks cannot conclude that the pa¬ 
tient is free of the virus, though the ELISA method 
was still defined as a good standard for rapid diag¬ 
nosis of SARS (85). Seroconversion from negative 
to positive or a four-fold rise in antibody titer from 
acute to convalescent serum indicates recent infection 
(http: / / www.who.int / csr / sars / diagnostictests/en/). 

Molecular Epidemiology and 
Evolution of SARS 

The epidemiology of SARS has been extensively inves¬ 
tigated since the outbreak of SARS in November 2002 
in Guangdong (1). This traditional method was used 
to access the epidemiology of SARS initially. Molec¬ 
ular epidemiology can be used to trace the disease 
transmission by using phylogenetic analysis of viral 
nucleotide sequence, which can quickly identify and 
aid in monitoring the transmission (93). 
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In coronavirus, variations in the spike protein can 
drastically affect viral entry, pathogenesis (94), anti¬ 
viral immune response (29), virulence (95), cellular 
(6), or even species tropism (7). The S gene has been 
used as a target for genotyping most coronaviruses, 
like human coronaviruses (96) and IBV (97). Study 
of the N-terminal region of the SARS-CoV spike pro¬ 
tein produced similar conclusions by conventional epi¬ 
demiology methods (98). The investigation included 
the collection SI gene sequences from SARS patients 
in Hong Kong and Guangdong during February-April 
2003 mainly by direct sequencing of RT-PCR prod¬ 
ucts derived from clinical specimens, and compared 
it phylogenetically to additional 27 other sequences 
available from GenBank. The majority of the Hong 
Kong viruses, including those from a large outbreak 
in a high-rise apartment block, Amoy Garden, clus¬ 
tered to a single index case that came from Guang¬ 
dong to Hong Kong in late February (Figure 3). Most 
of the viruses derived from Hong Kong patients be¬ 
long to the same lineage with viruses derived from the 
Hong Kong index case. Outbreaks in Canada, Sin¬ 
gapore, Taiwan and Vietnam were also derived from 
the SARS-CoV of the same initial virus lineage as 
judged from the same phylogenetic analysis. A num¬ 
ber of viruses derived from the early patients were 
excluded from the major lineage and formed distinct 
cluster, implying multiple introductions of the virus 
have occurred, although these viruses did not caused 
large-scale outbreaks. Viral sequences identified in 
Guangdong and Beijing are genetically more diverse 
(1,98), implying that the SARS-CoV has been cir¬ 
culating there for a while before the introduction to 
Hong Kong. The Hong Kong index case that initiated 
the first super-spreading incident to affect 12 other 
patients might be simply a matter of chance or the 
viruses found in that patient were contagious to initi¬ 
ate super-spreading events, but these still need further 
investigations. Apart from findings that indicate the 
possible transmission routes, transitional isolates that 
possess both the characteristics of two lineages were 
also identified. Ruan et al (99) and Tsui et al (100) 
performed similar analysis based on the comparison of 
full genome sequences of different SARS-CoV isolates. 
They independently identified some of the variations, 
as Guan et al (98) did. Chiu et al (101 ) have recently 
identified the nucleotide substitution in the S gene 
that is unique to the Taiwan isolates and was linked 
to the Hong Kong index case. Sequence comparison 
of the Amoy Garden isolates revealed no significant 
variations within the SI gene, or across the whole 


genome, implying that other non-viral factors may 
contribute to the abnormal transmission and clinical 
presentation of SARS in this cluster of high-rise apart¬ 
ments (98,102). In summary, the transmission route 
of the SARS-CoV in different countries and areas cor¬ 
relates well with the traditional epidemiological find¬ 
ings, implying the successful application of molecular 
epidemiological techniques in tracing the virus trans¬ 
mission history. 

Concerning viral evolution, Zeng et al (103) have 
performed a linear regression analysis and tried to es¬ 
timate the last appearance of the SARS-CoV common 
ancestor. With such effort, which has been success¬ 
fully applied in timing of the ancestral sequence of 
human immunodeficiency virus (HIV; ref. 104), the 
ancestral sequence is believed to have appeared last in 
late 2002. These preliminary findings provide impor¬ 
tant information for tracing the origin of the SARS- 
CoV and monitoring its spread. 

Immunity, Vaccination and Anti¬ 
viral Drug Design 

Current knowledge on coronavirus immunity has 
mainly been acquired from research on animal coro¬ 
naviruses. Clinical observations have shown that hu¬ 
moral and cell-mediated immune responses may be 
both necessary against SARS-CoV infection (105). It 
was reported that T cell (CD3+, CD4 and CD8+) 
depletion was observed in early infection, but that 
levels returned to normal as the disease was improved 
(106). IgG antibody could be detected at the 7th 
day after the onset of symptoms and kept at high 
titer at least three months (107). Another report in¬ 
dicated that the virus was still detectable in respira¬ 
tory and stool specimens by RT-PCR diagnosis but 
could not be cultured more than 40 days after pre¬ 
sentation (108), implying that the antibody could be 
stimulated rapidly and might restrict the virus infec¬ 
tion. However it has also been reported in fowl and 
feline coronaviral diseases that low-level antibody may 
exacerbate diseases (109). It is therefore important 
to conduct further investigations into the immune re¬ 
sponse to SARS patients in the future so as to benefit 
the vaccine development and disease control. 

Concerning the candidate target for vaccine de¬ 
velopment, the SI unit of the spike proteins has been 
identified as the host protective antigen and used as 
a vaccine candidate in other coronaviruses (110). An 
extensive structural analysis of the corresponding pro- 
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Fig. 3 Phylogenetic analysis of 169 SARS-CoV spike genes. Unrooted trees were constructed based on the optimal 
alignment by neighbor-joining method using MEGA 2. Number at the nodes indicates boostrap values in percentage. 
The branch length shows the genetic distance with reference to the horizontal scale bar. All sample names were hidden 
for the convenience of display, except the index case isolate HKU-33 (gray) and subcluster transition isolates (dark). 
The locations of these isolates on the tree were pinpointed by dots besides their names. The hypothetical common 
ancestors of the subclusters were highlighted as described in the right bottom of the figure. 


tein in SARS is thus desirable. With the identification 
of the SARS-CoV functional receptor (30) and the 
mapping of the receptor-binding domain on the spike 
protein (31), subunit vaccine targeting the receptor¬ 
binding domain and the preparation of killed or at¬ 
tenuated vaccine using ACE2 expression cell line may 
be promising (30). 

Antiviral drugs represent an alternative anti- 
SARS strategy to vaccination. Inhibiting chemicals 


targeting the SARS-CoV replication-related proteins 
were considered as anti-SARS-CoV drug candidates, 
e.g. inhibition the enzymatic activity of 3CL pro . 
An extensive structural analysis of 3CL PRO encoded 
from nsp5 on ORF la was performed (28, 111). The 
3 CL pro structure showed a considerable degree of 
conservation of the substrate-binding sites, with the 
evidence that it could retain its proteolytic activ¬ 
ity upon TGEV (transmissible gastroenteritis virus) 
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main proteinase {111), though another group men¬ 
tioned that the inactive property of the enzyme might 
exist in vitro {112). From this result, these authors 
suggested that the use of rhinovirus 3C PRO inhibitor 
might be useful in anti-SARS therapy. Two months 
later, a research group from the US conducted a study 
on the interaction of two chemicals (KZ7088 and the 
AVLQSGFR octapeptide) with 3CL PRO {113), fur¬ 
ther highlighting the importance of the main pro¬ 
teinase as a target for anti-viral drug design. Fan et al 
{114) provided valuable additional information, and 
concluded that only the dimeric form of the 3CL PRO 
is active and that the proteinase-substrate interac¬ 
tion can be speeded up if more beta-sheet-like struc¬ 
ture is involved in the substrate. Recently the crys¬ 
tal structure of 3CL PRO was reported by Yang et al 
{115). The 3 CL pro crystal underwent conforma¬ 
tional changes under different pH conditions while 
complexing with the specific inhibitor at the same 
time. A serine-protease fold with a Cys-His at the 
active site was recognized. On the other hand, the 
modeling of the structure of 20-O-MT domain located 
at nspl6 was proposed by von Grotthuss et al {116) 
using the 3D jury system with high reliability (3D jury 
score >100). The conservation of the unique tetrad 
residues K-D-K-E of the domain assigned a proposed 
mRNA cap methylation function of this domain, sug¬ 
gesting an alternative target for anti-viral drug design. 
In addition to main proteases, blocking the virus en¬ 
try should be considered as well. Structural analysis 
of the S2 domain of the SARS-CoV S protein, which 
plays a role in fusion of the virus with host cell, re¬ 
vealed a conservation of sequence motifs with the well- 
studied gp41 protein of HIV-1 and other viruses with 
class I transmembrane domain {27). Such a structure 
may be another target for drug design. 

Conclusion 

The collaborative efforts of the global scientific com¬ 
munity have provided invaluable insights into the 
molecular biology of the SARS-CoV. The develop¬ 
ment of a rapid and accurate method of diagnosis 
based on the molecular findings has helped to identify 
SARS patients at an early stage of the disease, thereby 
providing valuable information for national authori¬ 
ties to monitor the spread of the disease and take ef¬ 
fective quarantine measures, and contributing to the 
understanding of the clinical presentations of the syn¬ 
drome. The elucidation of the molecular biology of 


the SARS-CoV has provided a foundation for vaccine 
design and narrowed down the targets for large-scale 
high throughput drug screening program for anti-viral 
therapy. These advances helped the global commu¬ 
nity to contain the spread of SARS within four months 
since its first identification. However, much remains 
to be discovered about this novel coronavirus, and it 
may yet pose a serious threat. Unlike other recently 
identified viral diseases like Ebola and West Nile virus, 
it seems the transmission of SARS-CoV does not need 
a visible vector for spreading, and that a tiny, invisi¬ 
ble, respiratory droplet is sufficient to infect another 
person {117). The nearly undetectable symptom pre¬ 
sented by the recently confirmed SARS case in Sin¬ 
gapore suggests that the virus may continue to cir¬ 
culate undetectably {65). The possibility that com¬ 
mon domestic animals are also a virus reservoir for 
SARS further complicates the struggle to contain and 
ultimately eradicate this disease. In these aspects, 
sensitive, accurate and rapid diagnosis plays an ex¬ 
tremely important role in limiting the disease spread, 
especially in the developing world and densely pop¬ 
ulated countries. Luckily, the aggressive quarantine 
measures imposed by the WHO proved to be effec¬ 
tive in containing the outbreak, and the experience 
gained in the last SARS outbreak has prepared us to 
face another outbreak with some confidence. Never¬ 
theless, nobody can predict exactly when an effective 
vaccine or anti-viral drug will be developed. All that 
can be said is that, based on our growing knowledge 
of the molecular epidemiology and evolution of the 
virus, the successful development of countermeasures 
to SARS is very possible. 
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